Reducing spectral mismatches in concatenative speech synthesis via systematic database enrichment
نویسندگان
چکیده
This paper presents work performed for the Time-Domain TTS system, which is being developed at the ILSP for the Greek language. It focuses on the enhancement of the synthetic speech quality, by reducing the spectral mismatches between concatenated segments. To that end, a study has been performed to determine the distance that can best predict when a spectral mismatch is audible. Experimentation with different spectral distances has taken place and the distance with the best performance has been used in order to systematically enrich the segment database, which initially contained only one instance per segment. Results of this procedure indicate a substantial improvement on the synthetic speech quality.
منابع مشابه
On the Detection of Discontinuities in Concatenative Speech Synthesis
Last decade considerable work has been done in finding an objective distance measure which is able to predict audible discontinuities in concatenative speech synthesis. Speech segments in concatenative synthesis are extracted from disjoint phonetic contexts and discontinuities in spectral shape and phase mismatches tend to occur at unit boundaries. Many feature sets —most of them of spectral na...
متن کاملEfficient Speech Synthesis System using the Deterministic plus Stochastic Model
In this paper, a high-quality concatenative synthesis system using the deterministic plus stochastic model of speech is described, in which the prosodic modifications are performed by means of very simple and efficient operations, as we reported in a previous work [11]. In particular, pitchsynchrony is not necessary, and linear interpolations substitute other types of estimation. The method for...
متن کاملSpectral smoothing for concatenative speech synthesis
This paper addresses the topic of performing e ective concatenative speech synthesis with a limited database by proposing methods to smooth the transitions between speech segments. The objective is to produce naturalsounding speech via segment concatenation when formants and other spectral features do not align properly. We propose several methods for adjusting the spectra between waveform segm...
متن کاملRemoving linear phase mismatches in concatenative speech synthesis
Many current text-to-speech (TTS) systems are based on the concatenation of acoustic units of recorded speech. While this approach is believed to lead to higher intelligibility and naturalness than synthesis-by-rule, it has to cope with the issues of concatenating acoustic units that have been recorded at different times and in a different order. One important issue related to the concatenation...
متن کاملSynchronization of speech frames based on phase data with application to concatenative speech synthesis
Synchronization of speech frames is an important issue in a concatenative speech synthesis system. In terms of signal processing this is translated in removing linear phase mismatches between concatenated speech frames. This paper presents two novel approaches to the problem of synchronization of speech frames with an application to concatenative speech synthesis. Both methods are based on a pr...
متن کامل